Fault-tolerant meshes with minimal numbers of spares
نویسندگان
چکیده
This paper presents several techniques for adding fault-tolerance t o distributed memory parallel computers. More formally, given a target graph with n nodes, we create a fault-tolerant graph with n + k nodes such that given any set of k or fewer faulty nodes, the remaining graph is guaranteed to contain the target graph as a fault-free subgraph. As a result, any algorithm designed for the target graph will run with no slowdown in the presence of k or fewer node faults, regardless of their distribution. We present fault-tolerant graphs for target graphs which are 2dimensional meshes, tori, eight-connected meshes and hexagon.al meshes. In all cases our fault-tolerant graphs have smaller degree than any previously known graphs with the same properties.
منابع مشابه
Fault-Tolerant Meshes and Hypercubes with Minimal Numbers of Spares
Many parallel computers consist of processors connected in the form of a d-dimensional mesh or hypercube. Twoand three-dimensional meshes have been shown to be efficient in manipulating images and dense matrices, whereas hypercubes have been shown to be well suited to divide-andconquer algorithms requiring global communication. However, even a single faulty processor or communication link can s...
متن کاملFault-Tolerant Adaptive and Minimal Routing in Mesh-Connected Multicomputers Using Extended Safety Levels
ÐThe minimal routing problem in mesh-connected multicomputers with faulty blocks is studied, Two-dimensional meshes are used to illustrate the approach. A sufficient condition for minimal routing in 2D meshes with faulty blocks is proposed. Unlike many traditional models that assume all the nodes know global fault distribution, our approach is based on the concept of an extended safety level, w...
متن کاملFault tolerant system with imperfect coverage, reboot and server vacation
This study is concerned with the performance modeling of a fault tolerant system consisting of operating units supported by a combination of warm and cold spares. The on-line as well as warm standby units are subject to failures and are send for the repair to a repair facility having single repairman which is prone to failure. If the failed unit is not detected, the system enters into an unsafe...
متن کاملA Fault-Tolerant Adaptive and Minimal Routing Approach in 3-D Meshes
In this paper a sufficient condition is given for minimal routing in n-dimensional (n-D) meshes with faulty nodes contained in a set of disjoint fault regions. It is based on an early work of the author on minimal routing in low dimension meshes (such as 2-D meshes with faulty blocks). Unlike many traditional models that assume all the nodes know global fault distribution, our approach is based...
متن کاملA Fault-tolerant Adaptive and Minimal Routing Scheme in $n$-D Meshes
In this paper a sufficient condition is given for minimal routing in n-dimensional (n-D) meshes with faulty nodes contained in a set of disjoint fault regions. It is based on an early work of the author on minimal routing in low-dimensional meshes (such as 2-D meshes with faulty blocks). Unlike many traditional models that assume all the nodes know global fault distribution, our approach is bas...
متن کامل